Contributors

Evan Collins ()

Kelly Farley ()

Ken Stier ()

The Dataset

Raw dataset: COVID-19 infection and death statistics from U.S. counties (sourced from NYT), combined with economic, education, and population data (sourced from various government agencies) and also survey responses about mask-wearing frequencies (sourced from NYT). 3141 complete observations on 19 metric variables and 6 categorical variables. To avoid any outliers due to population size differences between counties, all variables are scaled as a percentage of population. Variable descriptions can be found here.

Data of relevance for this pset:

Categorical Predictor 1 (rural_urban_code): The Rural-Urban Codes are numbered 1-9 according to descriptions provided by the USDA. We will regroup codes 1 through 9 as into three groups: (1) “Urban” for codes 1-3, (2) “Suburban” for codes 4-6, and (3) “Rural” for codes 7-9.

Categorical Predictor 2 (region): Each county is associated with a state name, which we will group into regions as defined by the U.S. Census Bureau: Northeast, Midwest, South, and West.

3 Continuous Response Variables: A NYT survey about masking behaviors asked people whether they wear a mask in public when they expect to be within 6 feet of another person and calculated the responses for each county for never, rarely, sometimes, frequently, and always mask. We choose to look at the extremes and will examine 3 continuous response variables: never mask, sometimes mask, and always mask.

raw <- readr::read_csv("https://evancollins.com/covid_and_demographics.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   X1 = col_character(),
##   County_Name = col_character(),
##   State_Name = col_character(),
##   FIPS = col_character()
## )
## See spec(...) for full column specifications.
# create categorical variables: rural-urban code (3 levels), region (4 variables)
raw <- 
  raw %>%
    mutate(region = case_when(
      State_Name %in% c("Washington", "Oregon", "California", "Nevada", "Idaho", "Montana", "Utah", "Arizona", "Wyoming", "Colorado", "New Mexico", "Alaska", "Hawaii") ~ "West",
      State_Name %in% c("North Dakota", "South Dakota", "Nebraska", "Kansas", "Minnesota", "Iowa", "Missouri", "Wisconsin", "Illinois", "Michigan", "Indiana", "Ohio") ~ "Midwest",
      State_Name %in% c("Texas", "Oklahoma", "Arkansas", "Louisiana", "Mississippi", "Tennessee", "Kentucky", "Alabama", "Georgia", "Florida", "South Carolina", "North Carolina", "Virginia", "West Virginia", "District of Columbia", "Delaware", "Maryland") ~ "South",
      State_Name %in% c("Pennsylvania", "New Jersey", "Connecticut", "Rhode Island", "Massachusetts", "New Hampshire", "Vermont", "Maine", "New York") ~ "Northeast"),
      rural_urban_code = case_when(
        Rural_Urban_Code_2013 %in% c(1, 2, 3) ~ "Urban",
        Rural_Urban_Code_2013 %in% c(4, 5, 6) ~ "Suburban",
        Rural_Urban_Code_2013 %in% c(7, 8, 9) ~ "Rural")
      )
raw$rural_urban_code <- as.factor(raw$rural_urban_code) # Rural is reference

1: Interactions Plot

interaction.plot(raw$rural_urban_code, raw$region, raw$Never_Wear_Mask_Survey,
  lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region", 
  xlab = "Rural-Urban Setting", main = "Interaction Plot for Never Mask")

interaction.plot(raw$rural_urban_code, raw$region, raw$Sometimes_Wear_Mask_Survey,
  lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region", 
  xlab = "Rural-Urban Setting", main = "Interaction Plot for Sometimes Mask")

interaction.plot(raw$rural_urban_code, raw$region, raw$Always_Wear_Mask_Survey,
  lwd = 3, col = c("red", "blue", "black", "green"), trace.label = "Region", 
  xlab = "Rural-Urban Setting", main = "Interaction Plot for Always Mask")

There does appear to be interaction between the rural-urban setting and the region, as evidenced by the non-parallel lines on the interaction plots for never masking, sometimes masking, and always masking. In particular, the West region seems to have behaviors that most contradict those of other regions, particularly in the Western suburbs.

2: Two-Way MANOVA

Univariate:

Anova(lm(Never_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
## 
## Response: Never_Wear_Mask_Survey
##                         Sum Sq   Df  F value    Pr(>F)    
## (Intercept)             7.5571    1 2849.276 < 2.2e-16 ***
## region                  0.5692    3   71.535 < 2.2e-16 ***
## rural_urban_code        0.6754    2  127.318 < 2.2e-16 ***
## region:rural_urban_code 0.1324    6    8.321  5.75e-09 ***
## Residuals               8.2990 3129                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(Sometimes_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
## 
## Response: Sometimes_Wear_Mask_Survey
##                          Sum Sq   Df   F value  Pr(>F)    
## (Intercept)             10.8499    1 3966.9689 < 2e-16 ***
## region                   0.3800    3   46.3072 < 2e-16 ***
## rural_urban_code         0.2320    2   42.4113 < 2e-16 ***
## region:rural_urban_code  0.0294    6    1.7926 0.09662 .  
## Residuals                8.5580 3129                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova(lm(Always_Wear_Mask_Survey ~ region*rural_urban_code, data = raw), type = 3)
## Anova Table (Type III tests)
## 
## Response: Always_Wear_Mask_Survey
##                         Sum Sq   Df  F value    Pr(>F)    
## (Intercept)             59.119    1 4181.798 < 2.2e-16 ***
## region                   4.959    3  116.919 < 2.2e-16 ***
## rural_urban_code         2.979    2  105.361 < 2.2e-16 ***
## region:rural_urban_code  0.934    6   11.017 3.485e-12 ***
## Residuals               44.236 3129                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Univariate:

The region, rural-urban code, and their interaction are significant in all 3 univariate models for masking behaviors, with p-values << alpha = 0.05. Based on coefficients, region is a more important predictor of sometimes masking, then always masking, then never masking; rural-urban code is a more important predictor of never masking, then sometimes masking, then always masking. but overall a less important predictor than region. Their interaction is most important in never masking, then always masking, then sometimes masking.

Multivariate:

Overall, there are differences between region and rural-urban codes (all multivariate statistics are significant). All of the multivariate tests suggest there is an interaction effect between region and rural-urban code.

multimod <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ region*rural_urban_code, data = raw)
summary(Anova(multimod), univariate=T)
## 
## Type II MANOVA Tests:
## 
## Sum of squares and products for error:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   8.298969                   1.335098
## Sometimes_Wear_Mask_Survey               1.335098                   8.558012
## Always_Wear_Mask_Survey                -11.042686                 -11.266079
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -11.04269
## Sometimes_Wear_Mask_Survey               -11.26608
## Always_Wear_Mask_Survey                   44.23570
## 
## ------------------------------------------
##  
## Term: region 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  0.9675336                   1.031265
## Sometimes_Wear_Mask_Survey              1.0312653                   1.172121
## Always_Wear_Mask_Survey                -3.7941207                  -4.016547
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -3.794121
## Sometimes_Wear_Mask_Survey               -4.016547
## Always_Wear_Mask_Survey                  15.197097
## 
## Multivariate Tests: region
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            3 0.2828710 108.5832      9 9387.000 < 2.22e-16 ***
## Wilks             3 0.7240752 119.9592      9 7610.447 < 2.22e-16 ***
## Hotelling-Lawley  3 0.3714955 129.0190      9 9377.000 < 2.22e-16 ***
## Roy               3 0.3438340 358.6189      3 3129.000 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: rural_urban_code 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  0.9798586                  0.7168383
## Sometimes_Wear_Mask_Survey              0.7168383                  0.5430791
## Always_Wear_Mask_Survey                -2.7350157                 -2.0139890
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -2.735016
## Sometimes_Wear_Mask_Survey               -2.013989
## Always_Wear_Mask_Survey                   7.643303
## 
## Multivariate Tests: rural_urban_code
##                  Df test stat  approx F num Df den Df     Pr(>F)    
## Pillai            2 0.1625339  92.22958      6   6256 < 2.22e-16 ***
## Wilks             2 0.8378157  96.42712      6   6254 < 2.22e-16 ***
## Hotelling-Lawley  2 0.1931626 100.63771      6   6252 < 2.22e-16 ***
## Roy               2 0.1909774 199.12574      3   3128 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: region:rural_urban_code 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 0.13241775                 0.04391835
## Sometimes_Wear_Mask_Survey             0.04391835                 0.02941714
## Always_Wear_Mask_Survey               -0.21615727                -0.13410491
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  -0.2161573
## Sometimes_Wear_Mask_Survey              -0.1341049
## Always_Wear_Mask_Survey                  0.9344807
## 
## Multivariate Tests: region:rural_urban_code
##                  Df test stat  approx F num Df   den Df     Pr(>F)    
## Pillai            6 0.0418657  7.380650     18 9387.000 < 2.22e-16 ***
## Wilks             6 0.9585755  7.405305     18 8844.977 < 2.22e-16 ***
## Hotelling-Lawley  6 0.0427549  7.424303     18 9377.000 < 2.22e-16 ***
## Roy               6 0.0254642 13.279579      6 3129.000 6.5347e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Type II Sums of Squares
##                           df Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region                     3                0.96753                   1.172121
## rural_urban_code           2                0.97986                   0.543079
## region:rural_urban_code    6                0.13242                   0.029417
## residuals               3129                8.29897                   8.558012
##                         Always_Wear_Mask_Survey
## region                                 15.19710
## rural_urban_code                        7.64330
## region:rural_urban_code                 0.93448
## residuals                              44.23570
## 
##  F-tests
##                         Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region                                  121.60                     214.28
## rural_urban_code                        123.15                      99.28
## region:rural_urban_code                  16.64                       5.38
##                         Always_Wear_Mask_Survey
## region                                   179.16
## rural_urban_code                          90.11
## region:rural_urban_code                   11.02
## 
##  p-values
##                         Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## region                  < 2.22e-16             < 2.22e-16                
## rural_urban_code        < 2.22e-16             < 2.22e-16                
## region:rural_urban_code 1.0011e-10             0.0046608                 
##                         Always_Wear_Mask_Survey
## region                  < 2.22e-16             
## rural_urban_code        < 2.22e-16             
## region:rural_urban_code 3.4855e-12

3 Contrasts (Univariate and Multivariate)

Let’s do the following comparisons:

  • Multivariate - Rural vs. Suburban

  • Multivariate - Rural vs. Urban

  • Univariate - Rural vs. Urban

  • Multivariate - Interaction between Rural and Urban between regions Northeast and South

  • Univariate - Interaction between Rural and Urban between regions Northeast and South

First, let’s make a variable catcomb that combines both categorical variables.

raw$catcomb <- paste(raw$rural_urban_code, raw$region, sep = "")
table(raw$catcomb)
## 
##      RuralMidwest    RuralNortheast        RuralSouth         RuralWest 
##               443                29               406               199 
##   SuburbanMidwest SuburbanNortheast     SuburbanSouth      SuburbanWest 
##               310                58               424               107 
##      UrbanMidwest    UrbanNortheast        UrbanSouth         UrbanWest 
##               302               130               592               141
options(contrasts = c("contr.treatment", "contr.poly"))

# Make catcomb a factor
raw$catcomb <- as.factor(raw$catcomb) # RuralMidwest is reference level

# Multivariate - Fit one way MANOVA model
multimod2 <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ catcomb, data = raw)
# Univariate - Fit one way ANOVA model just for Never_Wear_Mask_Survey
multimodNever <- lm(Never_Wear_Mask_Survey ~ catcomb, data = raw)

contrasts(raw$catcomb)
##                   RuralNortheast RuralSouth RuralWest SuburbanMidwest
## RuralMidwest                   0          0         0               0
## RuralNortheast                 1          0         0               0
## RuralSouth                     0          1         0               0
## RuralWest                      0          0         1               0
## SuburbanMidwest                0          0         0               1
## SuburbanNortheast              0          0         0               0
## SuburbanSouth                  0          0         0               0
## SuburbanWest                   0          0         0               0
## UrbanMidwest                   0          0         0               0
## UrbanNortheast                 0          0         0               0
## UrbanSouth                     0          0         0               0
## UrbanWest                      0          0         0               0
##                   SuburbanNortheast SuburbanSouth SuburbanWest UrbanMidwest
## RuralMidwest                      0             0            0            0
## RuralNortheast                    0             0            0            0
## RuralSouth                        0             0            0            0
## RuralWest                         0             0            0            0
## SuburbanMidwest                   0             0            0            0
## SuburbanNortheast                 1             0            0            0
## SuburbanSouth                     0             1            0            0
## SuburbanWest                      0             0            1            0
## UrbanMidwest                      0             0            0            1
## UrbanNortheast                    0             0            0            0
## UrbanSouth                        0             0            0            0
## UrbanWest                         0             0            0            0
##                   UrbanNortheast UrbanSouth UrbanWest
## RuralMidwest                   0          0         0
## RuralNortheast                 0          0         0
## RuralSouth                     0          0         0
## RuralWest                      0          0         0
## SuburbanMidwest                0          0         0
## SuburbanNortheast              0          0         0
## SuburbanSouth                  0          0         0
## SuburbanWest                   0          0         0
## UrbanMidwest                   0          0         0
## UrbanNortheast                 1          0         0
## UrbanSouth                     0          1         0
## UrbanWest                      0          0         1
levels(raw$catcomb)
##  [1] "RuralMidwest"      "RuralNortheast"    "RuralSouth"       
##  [4] "RuralWest"         "SuburbanMidwest"   "SuburbanNortheast"
##  [7] "SuburbanSouth"     "SuburbanWest"      "UrbanMidwest"     
## [10] "UrbanNortheast"    "UrbanSouth"        "UrbanWest"
# Get multivariate contrast for Rural vs. Suburban
linearHypothesis(multimod2, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombSuburbanMidwest - catcombSuburbanNortheast - catcombSuburbanSouth - catcombSuburbanWest = 0")
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 0.06956023                0.020719275
## Sometimes_Wear_Mask_Survey             0.02071928                0.006171462
## Always_Wear_Mask_Survey               -0.20126412               -0.059948715
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 -0.20126412
## Sometimes_Wear_Mask_Survey             -0.05994872
## Always_Wear_Mask_Survey                 0.58233336
## 
## Sum of squares and products for error:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   8.298969                   1.335098
## Sometimes_Wear_Mask_Survey               1.335098                   8.558012
## Always_Wear_Mask_Survey                -11.042686                 -11.266079
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -11.04269
## Sometimes_Wear_Mask_Survey               -11.26608
## Always_Wear_Mask_Survey                   44.23570
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.0156343 16.55496      3   3127 1.1358e-10 ***
## Wilks             1 0.9843657 16.55496      3   3127 1.1358e-10 ***
## Hotelling-Lawley  1 0.0158826 16.55496      3   3127 1.1358e-10 ***
## Roy               1 0.0158826 16.55496      3   3127 1.1358e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the multivariate tests between Rural and Suburban shown above, we can see that the Pillai, Wilks, Hotelling-Lawley, and Roy values all have p=1.1358e-10 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Suburban are significantly different.

# Get multivariate contrast for Rural vs. Urban
linearHypothesis(multimod2, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0")
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  0.4017247                  0.2808438
## Sometimes_Wear_Mask_Survey              0.2808438                  0.1963366
## Always_Wear_Mask_Survey                -1.2512379                 -0.8747344
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  -1.2512379
## Sometimes_Wear_Mask_Survey              -0.8747344
## Always_Wear_Mask_Survey                  3.8971867
## 
## Sum of squares and products for error:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   8.298969                   1.335098
## Sometimes_Wear_Mask_Survey               1.335098                   8.558012
## Always_Wear_Mask_Survey                -11.042686                 -11.266079
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -11.04269
## Sometimes_Wear_Mask_Survey               -11.26608
## Always_Wear_Mask_Survey                   44.23570
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.0840648 95.66564      3   3127 < 2.22e-16 ***
## Wilks             1 0.9159352 95.66564      3   3127 < 2.22e-16 ***
## Hotelling-Lawley  1 0.0917803 95.66564      3   3127 < 2.22e-16 ***
## Roy               1 0.0917803 95.66564      3   3127 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the multivariate tests between Rural and Urban shown above, we can see that the Pillai, Wilks, Hotelling-Lawley, and Roy values all have p=2.22e-16 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Urban are significantly different.

# Get univariate contrast for Rural vs. Urban
linearHypothesis(multimodNever, "catcombRuralNortheast + catcombRuralSouth + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0")
## Linear hypothesis test
## 
## Hypothesis:
## catcombRuralNortheast  + catcombRuralSouth  + catcombRuralWest - catcombUrbanMidwest - catcombUrbanNortheast - catcombUrbanSouth - catcombUrbanWest = 0
## 
## Model 1: restricted model
## Model 2: Never_Wear_Mask_Survey ~ catcomb
## 
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   3130 8.7007                                  
## 2   3129 8.2990  1   0.40172 151.46 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the univariate F-test between Rural and Urban shown above, we can see that the p<2.2e-16 < 0.05. Thus, we reject the null hypothesis and conclude Rural and Urban are significantly different for Never_Wear_Mask_Survey.

#Get multivariate contrast for Northeast,South and Rural,Urban interaction
linearHypothesis(multimod2, "catcombRuralNortheast - catcombUrbanNortheast - catcombRuralSouth + catcombUrbanSouth = 0") 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey               0.0003439765                0.001177750
## Sometimes_Wear_Mask_Survey           0.0011777504                0.004032532
## Always_Wear_Mask_Survey              0.0004625952                0.001583892
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                0.0004625952
## Sometimes_Wear_Mask_Survey            0.0015838923
## Always_Wear_Mask_Survey               0.0006221191
## 
## Sum of squares and products for error:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   8.298969                   1.335098
## Sometimes_Wear_Mask_Survey               1.335098                   8.558012
## Always_Wear_Mask_Survey                -11.042686                 -11.266079
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -11.04269
## Sometimes_Wear_Mask_Survey               -11.26608
## Always_Wear_Mask_Survey                   44.23570
## 
## Multivariate Tests: 
##                  Df test stat approx F num Df den Df  Pr(>F)
## Pillai            1 0.0012273 1.280839      3   3127 0.27918
## Wilks             1 0.9987727 1.280839      3   3127 0.27918
## Hotelling-Lawley  1 0.0012288 1.280839      3   3127 0.27918
## Roy               1 0.0012288 1.280839      3   3127 0.27918

For the multivariate test above evaluating the interaction between Rural and Urban between regions Northeast and South, the difference is not shown to be significantly different, as the p value (0.27918) of the multivariate tests is greater than 0.05.

#Get univariate contrast for Northeast,South and Rural,Urban interaction
linearHypothesis(multimodNever, "catcombRuralNortheast - catcombUrbanNortheast - catcombRuralSouth + catcombUrbanSouth = 0")
## Linear hypothesis test
## 
## Hypothesis:
## catcombRuralNortheast - catcombRuralSouth - catcombUrbanNortheast  + catcombUrbanSouth = 0
## 
## Model 1: restricted model
## Model 2: Never_Wear_Mask_Survey ~ catcomb
## 
##   Res.Df    RSS Df  Sum of Sq      F Pr(>F)
## 1   3130 8.2993                            
## 2   3129 8.2990  1 0.00034398 0.1297 0.7188

For the univariate test above evaluating the interaction between Rural and Urban between regions Northeast and South, the difference in Never_Mask_Survey is not shown to be significantly different, as the p value (0.7188) is greater than 0.05.

4 Multiple-Response Linear Model

Let’s add two other continuous variables as covariates to the model and fit as a multiple-response linear model. We will include `andPercent_Adults_Bachelors_or_Higher` as covariates.

Let’s first plot the relationships between the covariates and the three response variables.

names(raw)
##  [1] "X1"                                                 
##  [2] "County_Name"                                        
##  [3] "State_Name"                                         
##  [4] "FIPS"                                               
##  [5] "Never_Wear_Mask_Survey"                             
##  [6] "Rarely_Wear_Mask_Survey"                            
##  [7] "Sometimes_Wear_Mask_Survey"                         
##  [8] "Frequently_Wear_Mask_Survey"                        
##  [9] "Always_Wear_Mask_Survey"                            
## [10] "Unemployment_Rate_2019"                             
## [11] "Median_Household_Income_2019"                       
## [12] "Median_Household_Income_Percent_of_State_Total_2019"
## [13] "Percent_Poverty_2019"                               
## [14] "Percent_Adults_Less_Than_HS"                        
## [15] "Percent_Adults_Bachelors_or_Higher"                 
## [16] "Population_2019"                                    
## [17] "Net_Migration_Rate_2019"                            
## [18] "Death_Rate_2019"                                    
## [19] "Birth_Rate_2019"                                    
## [20] "Rural_Urban_Code_2013"                              
## [21] "Economic_Typology_2015"                             
## [22] "Covid_Confirmed_Cases_as_pct"                       
## [23] "Covid_Deaths_as_pct"                                
## [24] "Covid_New_Cases_as_pct"                             
## [25] "Civilian_Labor_Force_2019_as_pct"                   
## [26] "region"                                             
## [27] "rural_urban_code"                                   
## [28] "catcomb"
# For Median_Household_Income_2019
ggplot(raw, aes(x=Never_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Never_Wear_Mask_Survey vs. Median Household Income")

ggplot(raw, aes(x=Sometimes_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Sometimes_Wear_Mask_Survey vs. Median Household Income")

ggplot(raw, aes(x=Always_Wear_Mask_Survey, y=Median_Household_Income_2019)) + geom_smooth(method = lm, color = "green") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Always_Wear_Mask_Survey vs. Median Household Income")

# For Percent_Adults_Bachelors_or_Higher
ggplot(raw, aes(x=Never_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Never_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")

ggplot(raw, aes(x=Sometimes_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Sometimes_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")

ggplot(raw, aes(x=Always_Wear_Mask_Survey, y=Percent_Adults_Bachelors_or_Higher)) + geom_smooth(method = lm, color = "blue") + geom_point(color = "red", cex=0.5, alpha=0.3) + labs(title="Always_Wear_Mask_Survey vs. % Adults with Bachelor's or Higher")

options(contrasts = c("contr.sum", "contr.poly"))

multimod3 <- lm(cbind(Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey) ~ region*rural_urban_code + Median_Household_Income_2019 + Percent_Adults_Bachelors_or_Higher, data = raw)

#Multivariate results and univariate results with with type 3 Sum of squares
summary(Anova(multimod3, type = 3), univariate = T)
## 
## Type III MANOVA Tests:
## 
## Sum of squares and products for error:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   8.024013                   1.052061
## Sometimes_Wear_Mask_Survey               1.052061                   8.248288
## Always_Wear_Mask_Survey                -10.274843                 -10.429628
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -10.27484
## Sometimes_Wear_Mask_Survey               -10.42963
## Always_Wear_Mask_Survey                   41.97602
## 
## ------------------------------------------
##  
## Term: (Intercept) 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   1.465860                   1.927422
## Sometimes_Wear_Mask_Survey               1.927422                   2.534319
## Always_Wear_Mask_Survey                  6.249751                   8.217641
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                    6.249751
## Sometimes_Wear_Mask_Survey                8.217641
## Always_Wear_Mask_Survey                  26.646065
## 
## Multivariate Tests: (Intercept)
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1  0.805661  4318.38      3   3125 < 2.22e-16 ***
## Wilks             1  0.194339  4318.38      3   3125 < 2.22e-16 ***
## Hotelling-Lawley  1  4.145645  4318.38      3   3125 < 2.22e-16 ***
## Roy               1  4.145645  4318.38      3   3125 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: region 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  0.8891676                  0.8878287
## Sometimes_Wear_Mask_Survey              0.8878287                  0.9500496
## Always_Wear_Mask_Survey                -3.4477626                 -3.4579297
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                   -3.447763
## Sometimes_Wear_Mask_Survey               -3.457930
## Always_Wear_Mask_Survey                  13.494766
## 
## Multivariate Tests: region
##                  Df test stat approx F num Df   den Df     Pr(>F)    
## Pillai            3 0.2577033  97.9518      9 9381.000 < 2.22e-16 ***
## Wilks             3 0.7457420 108.2628      9 7605.579 < 2.22e-16 ***
## Hotelling-Lawley  3 0.3363322 116.7322      9 9371.000 < 2.22e-16 ***
## Roy               3 0.3220783 335.7130      3 3127.000 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: rural_urban_code 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  0.1804586                 0.10959621
## Sometimes_Wear_Mask_Survey              0.1095962                 0.07673975
## Always_Wear_Mask_Survey                -0.6026318                -0.37800526
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  -0.6026318
## Sometimes_Wear_Mask_Survey              -0.3780053
## Always_Wear_Mask_Survey                  2.0266368
## 
## Multivariate Tests: rural_urban_code
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            2 0.0489154 26.12387      6   6252 < 2.22e-16 ***
## Wilks             2 0.9511373 26.42162      6   6250 < 2.22e-16 ***
## Hotelling-Lawley  2 0.0513174 26.71925      6   6248 < 2.22e-16 ***
## Roy               2 0.0502123 52.32121      3   3126 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: Median_Household_Income_2019 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 0.03812532                0.014684402
## Sometimes_Wear_Mask_Survey             0.01468440                0.005655864
## Always_Wear_Mask_Survey               -0.04490704               -0.017296456
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 -0.04490704
## Sometimes_Wear_Mask_Survey             -0.01729646
## Always_Wear_Mask_Survey                 0.05289508
## 
## Multivariate Tests: Median_Household_Income_2019
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.0053326 5.584584      3   3125 0.00081009 ***
## Wilks             1 0.9946674 5.584584      3   3125 0.00081009 ***
## Hotelling-Lawley  1 0.0053612 5.584584      3   3125 0.00081009 ***
## Roy               1 0.0053612 5.584584      3   3125 0.00081009 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: Percent_Adults_Bachelors_or_Higher 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 0.07095388                  0.1041369
## Sometimes_Wear_Mask_Survey             0.10413686                  0.1528385
## Always_Wear_Mask_Survey               -0.27609096                 -0.4052103
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  -0.2760910
## Sometimes_Wear_Mask_Survey              -0.4052103
## Always_Wear_Mask_Survey                  1.0743065
## 
## Multivariate Tests: Percent_Adults_Bachelors_or_Higher
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.0283637 30.40804      3   3125 < 2.22e-16 ***
## Wilks             1 0.9716363 30.40804      3   3125 < 2.22e-16 ***
## Hotelling-Lawley  1 0.0291917 30.40804      3   3125 < 2.22e-16 ***
## Roy               1 0.0291917 30.40804      3   3125 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: region:rural_urban_code 
## 
## Sum of squares and products for the hypothesis:
##                            Never_Wear_Mask_Survey Sometimes_Wear_Mask_Survey
## Never_Wear_Mask_Survey                 0.15414500                 0.05640962
## Sometimes_Wear_Mask_Survey             0.05640962                 0.03422600
## Always_Wear_Mask_Survey               -0.25568153                -0.15534870
##                            Always_Wear_Mask_Survey
## Never_Wear_Mask_Survey                  -0.2556815
## Sometimes_Wear_Mask_Survey              -0.1553487
## Always_Wear_Mask_Survey                  1.0127658
## 
## Multivariate Tests: region:rural_urban_code
##                  Df test stat  approx F num Df  den Df     Pr(>F)    
## Pillai            6 0.0460407  8.122949     18 9381.00 < 2.22e-16 ***
## Wilks             6 0.9544997  8.152075     18 8839.32 < 2.22e-16 ***
## Hotelling-Lawley  6 0.0471036  8.174216     18 9371.00 < 2.22e-16 ***
## Roy               6 0.0266350 13.881261      6 3127.00 1.2243e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Type III Sums of Squares
##                                      df Never_Wear_Mask_Survey
## (Intercept)                           1               1.465860
## region                                3               0.889168
## rural_urban_code                      2               0.180459
## Median_Household_Income_2019          1               0.038125
## Percent_Adults_Bachelors_or_Higher    1               0.070954
## region:rural_urban_code               6               0.154145
## residuals                          3127               8.024013
##                                    Sometimes_Wear_Mask_Survey
## (Intercept)                                         2.5343190
## region                                              0.9500496
## rural_urban_code                                    0.0767398
## Median_Household_Income_2019                        0.0056559
## Percent_Adults_Bachelors_or_Higher                  0.1528385
## region:rural_urban_code                             0.0342260
## residuals                                           8.2482885
##                                    Always_Wear_Mask_Survey
## (Intercept)                                      26.646065
## region                                           13.494766
## rural_urban_code                                  2.026637
## Median_Household_Income_2019                      0.052895
## Percent_Adults_Bachelors_or_Higher                1.074307
## region:rural_urban_code                           1.012766
## residuals                                        41.976018
## 
##  F-tests
##                                    Never_Wear_Mask_Survey
## (Intercept)                                        571.25
## region                                             346.51
## rural_urban_code                                    70.33
## Median_Household_Income_2019                        14.86
## Percent_Adults_Bachelors_or_Higher                  27.65
## region:rural_urban_code                             60.07
##                                    Sometimes_Wear_Mask_Survey
## (Intercept)                                            320.26
## region                                                 360.17
## rural_urban_code                                         9.70
## Median_Household_Income_2019                             2.14
## Percent_Adults_Bachelors_or_Higher                      19.31
## region:rural_urban_code                                 12.98
##                                    Always_Wear_Mask_Survey
## (Intercept)                                         992.50
## region                                              167.55
## rural_urban_code                                     75.49
## Median_Household_Income_2019                          0.66
## Percent_Adults_Bachelors_or_Higher                   40.02
## region:rural_urban_code                              12.57
## 
##  p-values
##                                    Never_Wear_Mask_Survey
## (Intercept)                        < 2.22e-16            
## region                             < 2.22e-16            
## rural_urban_code                   < 2.22e-16            
## Median_Household_Income_2019       0.00011827            
## Percent_Adults_Bachelors_or_Higher 1.5506e-07            
## region:rural_urban_code            1.2280e-14            
##                                    Sometimes_Wear_Mask_Survey
## (Intercept)                        < 2.22e-16                
## region                             < 2.22e-16                
## rural_urban_code                   2.2800e-06                
## Median_Household_Income_2019       0.14321118                
## Percent_Adults_Bachelors_or_Higher 2.0891e-12                
## region:rural_urban_code            0.00032053                
##                                    Always_Wear_Mask_Survey
## (Intercept)                        < 2.22e-16             
## region                             < 2.22e-16             
## rural_urban_code                   < 2.22e-16             
## Median_Household_Income_2019       0.68473479             
## Percent_Adults_Bachelors_or_Higher < 2.22e-16             
## region:rural_urban_code            4.6446e-14

In the above output chunk labeled “Term: region”, we can see region is a significant (p<2.22e-16) multivariate predictor.

In the above output chunk labeled “Term: rural_urban_code”, we can see rural-urban code is a significant (p<2.22e-16) multivariate predictor.

In the above output chunk labeled “Term: Median_Household_Income_2019”, we can see median household income is a significant (p=0.00081009) multivariate predictor.

In the above output chunk labeled “Term: Percent_Adults_Bachelors_or_Higher”, we can see median household income is a significant (p<2.22e-16) multivariate predictor.

In the above output chunk labeled “Term: region:rural_urban_code”, we can see the interaction between rural and rural-urban code is a significant (p<2.22e-16) multivariate predictor.

In the bottom chunk labeled “Type III Sums of Squares”, for each response variables, we can see the type III sum of squares, type III F-tests, and associated p values. From these univariate results, we can see that region is significant for each of the three response variables (Never_Wear_Mask_Survey, Sometimes_Wear_Mask_Survey, Always_Wear_Mask_Survey). Moreover, rural-urban code is significant for each of the three response variables. Median household income is significant just for Never_Wear_Mask_Survey. Percent of adults with a bachelor’s degree of higher is significant for all three response variables. And the interaction between region and rural-urban code is significant for each of the three response variables.